Improving Category Specific Web Search by Learning Query Modifications

نویسندگان

  • Eric J. Glover
  • Gary William Flake
  • Steve Lawrence
  • William P. Birmingham
  • Andries Kruger
  • C. Lee Giles
  • David M. Pennock
چکیده

Users looking for documents within specific categories may have a difficult time locating valuable documents using general purpose search engines. We present an automated method for learning query modifications that can dramatically improve precision for locating pages within specified categories using web search engines. We also present a classification procedure that can recognize pages in a specific category with high precision, using textual content, text location, and HTML structure. Evaluation shows that the approach is highly effective for locating personal homepages and calls for papers. These algorithms are used to improve category specific search in the Inquirus 2 search engine. Typical web search engines index millions of pages across a variety of categories, and return results ranked by expected topical relevance. Only a small percentage of these pages may be of a specific category, for example, personal homepages, or calls for papers. A user may examine large numbers of pages about the right topic, but not of the desired category. In this paper, we describe a methodology for category-specific web search. We use a classifier to recognize web pages of a specific category and learn modifications to queries that bias results toward documents in that category. Using this approach, we have developed metasearch tools to effectively retrieve documents in several categories, including personal homepages, calls for papers, research papers, product reviews, and guide or FAQ documents. For a specific category, our first step is to train a support vector machine (SVM) [16] to classify pages by membership in the desired category. Performance is improved by considering, in addition to words and phrases, the documents’ HTML structure and simple word location information (e.g., whether a word appears near the top of the document). Second, we learn a set of query modifications. For this experiment, a query modification is a set of extra words or phrases added to a user query to increase the likelihood that results of the desired category are ranked near the top.1 Since not all search engines respond the same way to modifications, we use our classifier to automatically evaluate the results from each search engine, and produce a ranking of search engine and query modification pairs. This approach compensates for differences between performance on the training set and the search engine, which has a larger database and unknown ordering policy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of users’ query reformulation behavior in Web with regard to Wholis-tic/analytic cognitive styles, Web experience, and search task type

Background and Aim: The basic aim of the present study is to investigate users’ query reformulation behavior with regard to wholistic-analytic cognitive styles, search task type, and experience variables in using the Web. Method: This study is an applied research using survey method. A total of 321 search queries were submitted by 44 users. Data collection tools were Riding’s Cognitive Style A...

متن کامل

Web pages ranking algorithm based on reinforcement learning and user feedback

The main challenge of a search engine is ranking web documents to provide the best response to a user`s query. Despite the huge number of the extracted results for user`s query, only a small number of the first results are examined by users; therefore, the insertion of the related results in the first ranks is of great importance. In this paper, a ranking algorithm based on the reinforcement le...

متن کامل

RRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features

Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...

متن کامل

Learning Query Reformulations for Personalized Web Search Using a Probabilistic Inference Network

The continuous development of the Internet has resulted in an exponential increase in the amount of available pages and made it into one of the prime sources of information A popular way to access this information is by submitting queries to a search engine which retrieves a set of documents. However, most search engines do not consider the specific information needs of the user and retrieve th...

متن کامل

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore

Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001